On Nonlinear Learned String Indexing

نویسندگان

چکیده

We investigate the potential of several artificial neural network architectures to be used as an index on a sorted set strings, namely, mapping from query string (an estimate of) its lexicographic rank in set, which allows solving some interesting string-search operations such range and prefix searches. Our evaluation variety real synthetic datasets shows that learned models can beat space vs error trade-off classic (possibly compressed) trie-based solutions for relatively dense only, while being slower trained queried. This leads us conclude are not yet competitive with solutions, thus cannot completely replace them, but possibly only integrate them. Although our study does settle question conclusively, it highlights appropriate methods, provides baseline comparison, introduces open problems, thereby serving starting point future research.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Hardness of Several String Indexing Problems

Let D = {d1, d2, ..., dD} be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index D for answering the following queries efficiently. – report/count the unique documents containing P1 and P2. – report/count the unique documents containing P1, but not P2. Here P1 and P2 represent input patterns of length p1 and p2 respectively. Linear space d...

متن کامل

Indexing for String Queries using

We present ve index trees designed for supporting string searches. We discuss Counted Trees, Substring Trees, and Regular Expression Trees, all of which suuer the same problem { their attempts at approximating a large set of data lead to an almost complete lack of information in the interior nodes. A variant of B-Trees, called a Preex-Suux Tree, avoids this diiculty by severely restricting the ...

متن کامل

Lessons Learned From Indexing Close Word Pairs

We describe experiments with proximity-aware ranking functions that use indexing of word pairs. Our goal is to evaluate a method of “mild” pruning of proximity information, which would be appropriate for a moderately loaded retrieval system, e.g., an enterprise search engine. We create an index that includes occurrences of close word pairs, where one of the words is frequent. This allows one to...

متن کامل

Indexing Methods for Approximate String Matching

Indexing for approximate text searching is a novel problem receiving much attention because of its applications in signal processing, computational biology and text retrieval, to name a few. We classify most indexing methods in a taxonomy that helps understand their essential features. We show that the existing methods, rather than completely diierent as they are regarded, form a range of solut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3295434